#Description of the Data ## listings = name, host, host name, neighborhood, neighborhood group, latitutde, longitutde, room type, price, minimum # of nights, number of reviews, last review, reviews per month, host listing number, availability, number of reviews, license ## reviews - listing id, id , date, reviewer id, reviewer name, comments ## calendar - details about booking for the next year about listing, date, available, price, minimum nights

1836 unique listings are provided for the washington DC area

over 30,000 reviews have been left from November 2010-December 2021

The main topics of data that I will be analyzing

Demand and Price Analysis

User Review

other interesting things

possible data analysis

Spatial Analysis

This section is looking at the various location of Airbnbs in DC

This is a interactive map of all the listins within DC. It is in a clustered form until you zoom in.

Each point you can select and click on a location to see Name, Host Name, Price, Room Type, Propery type.

Looking at the relationship between property type and neighbourhood. Room vs. Entire Apt/House

NOT WORKING

Demand and Price Analysis

Look at the demand over the years since the beginning Airbnbs in the DC area

look at the relationship of price vs demand = do prices of listings flucutate with demand, how do prices vary by days of the week

To find the demand, will use number of reviews as the indicator for demand

need to change dates in reviewsNUM to just be the year, so then the dates are confused the bottom ticks can be made into just years

NEED TO FIX GRAPH WITH YEARS AND ANGLES

How is Airbnb priced across the year?

We wanted to see if the pricing of the postings followed a similar trend after seeing the pattern in demand.

To address the aforementioned issue, we used the data from the ‘calendar’ table to look at the daily

average prices of the listings through time.

As the year advances, the average price of all listings tends to rise, peaking in December. Except in November and December, when the number of reviews (an indicator of demand) begins to fall, the pattern is identical to that of the number of reviews/demand. This appears to be counter-intuitive, as one would anticipate the price to fall as demand falls. This could be due to our assumption that the quantity of reviews reflects demand, which isn’t always the case.

On the above graphs, we can also notice two sets of points indicating that average prices on certain days were greater than on other days. To further comprehend this phenomena, we’ll create a box plot showing average costs by weekday.

We can see that Fridays and Saturdays have a higher concentrated price for the renting on the weekends.

Occupancy Rate by Month

I’ll end this section’s examination by looking at the occupancy forecast for the coming year.

We will use the table ‘calendar’ to determine the % occupancy for the next year, i.e., what

proportion of appartments have already been booked as of November 3, 2018 (the day the data

was obtained). We were unable to get historical occupancy data and, as a result, were unable

to investigate what the real occupancy rates are.

THIS SECTION NOT WORKING STILL NEED TO DOWNLOAD makeR

USER REVIEW (TEXTUAL DATA) MINIG

Building word vectors from Reviews

The previously constructed word cloud is effective at locating what clients are looking for, but it is quite broad. Isn’t it wonderful if we could find out what people think about the room sizes? Why don’t you investigate what makes consumers “uncomfortable”?

Comment analysis using word cloud

Let’s start by looking at the most common topics in the reviews; just creating a word cloud should enough. Wordclouds take a frequency count of the words in the corpus as input and produce a visually appealing representation of dominating (often occurring) words, with their size proportionate to their frequency. We have over a million reviews, thus we need to take a random sample, in this case 30,000 reviews. Despite the fact that the sampled dataset is minimal in contrast to the original, it meets our purpose well because we just need the basic terms here. As we’ll see in the next section, further study of “good” and “negative” reviews will require more data.

#These are the most words associated with uncomfortable

## 
## The downloaded binary packages are in
##  /var/folders/56/8zhv0g715nn5k97yqp1qpw800000gp/T//RtmpHR1X0v/downloaded_packages
## INFO  [17:52:51.984] epoch 1, loss 0.1989 
## INFO  [17:53:20.440] epoch 2, loss 0.1260 
## INFO  [17:54:04.058] epoch 3, loss 0.1029 
## INFO  [17:54:22.448] epoch 4, loss 0.0917 
## INFO  [17:54:41.971] epoch 5, loss 0.0844 
## INFO  [17:54:59.547] epoch 6, loss 0.0791 
## INFO  [17:55:17.759] epoch 7, loss 0.0750 
## INFO  [17:55:36.989] epoch 8, loss 0.0718 
## INFO  [17:55:58.556] epoch 9, loss 0.0691 
## INFO  [17:56:17.389] epoch 10, loss 0.0668 
## INFO  [17:56:39.015] epoch 11, loss 0.0649 
## INFO  [17:56:58.215] epoch 12, loss 0.0633 
## INFO  [17:57:15.805] epoch 13, loss 0.0618 
## INFO  [17:57:34.146] epoch 14, loss 0.0606 
## INFO  [17:57:52.485] epoch 15, loss 0.0595 
## INFO  [17:58:10.707] epoch 16, loss 0.0585 
## INFO  [17:58:28.703] epoch 17, loss 0.0577 
## INFO  [17:58:46.711] epoch 18, loss 0.0569 
## INFO  [17:59:04.888] epoch 19, loss 0.0562 
## INFO  [17:59:22.930] epoch 20, loss 0.0556
## 
## The downloaded binary packages are in
##  /var/folders/56/8zhv0g715nn5k97yqp1qpw800000gp/T//RtmpHR1X0v/downloaded_packages

These represent the good words pulled from the reviews.

Now analyzing the demand and Supply: Airbnb Customer Growth vs Listing Prices Over time

Some code has been copied from “author: Ankit Peshin, Sarang Gupta, Ankita Agrawal”